A New Boosting Algorithm for Classification on Distributed Databases

نویسندگان

  • Nguyen Thi Van Uyen
  • Seung Gwan Lee
  • TaeChoong Chung
چکیده

In this paper, we propose a new boosting algorithm for distributed databases. The main idea of the proposed method is to utilize the parallelism of the distributed databases to build an ensemble of classifiers. At each round of the algorithm, each site processes its own data locally, and calculates all needed information. A center site will collect information from all sites and build the global classifier, which is then a classifier in the ensemble. This global classifier is also used by each distributed site to compute required information for the next round. By repeating this process, an ensemble of classifiers, which is almost identical to the one built on the whole data, will be produced from the distributed databases. The experiments were performed on 5 different datasets from the UCI repository [9]. The experimental results show that the accuracy of the proposed algorithm is almost equal to or higher than the accuracy when applying boosting algorithm to the whole database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of majority protocol for controlling transactions concurrency in distributed databases by multi-agent systems

In this paper, we propose a new concurrency control algorithm based on multi-agent systems which is an extension of majority protocol. Then, we suggest a clustering approach to get better results in reliability, decreasing message passing and algorithm’s runtime. Here, we consider n different transactions working on non-conflict data items. Considering execution efficiency of some different...

متن کامل

Boosting Localized Classifiers in Heterogeneous Databases

Combining multiple global models (e.g. back-propagation based neural networks) is an effective technique for improving classification accuracy. This technique reduces variance by manipulating the distribution of the training data. In many large scale data analysis problems involving heterogeneous databases with attribute instability, standard boosting methods can be improved by coalescing multi...

متن کامل

Adaptive boosting techniques in heterogeneous and spatial databases

Combining multiple classifiers is an effective technique for improving classification accuracy by reducing the variance through manipulating the training data distributions. In many large-scale data analysis problems involving heterogeneous databases with attribute instability, however, standard boosting methods do not improve local classifiers (e.g. k-nearest neighbors) due to their low sensit...

متن کامل

Using Non-Archimedean DEA Models for Classification of DMUs: A New Algorithm

A new algorithm for classification of DMUs to efficient and inefficient units in data envelopment analysis is presented. This algorithm uses the non-Archimedean Charnes-Cooper-Rhodes[1] (CCR) model. Also, it applies an assurance value for the non-Archimedean                          using only simple computations on inputs and outputs of DMUs (see [18]). The convergence and efficiency of the ne...

متن کامل

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008